Skip to content

Conversation

@jorund1
Copy link
Collaborator

@jorund1 jorund1 commented Oct 28, 2025

Scope and purpose

Fixes #2373. Another PR intends to add documentation and release notes.

This pull request

  • Adds DHCP stats graphs to VLAN and Prefix pages in NAV where recent-enough stats for that VLAN/Prefix are found in the Carbon/Graphite database:
    dhcp-frontend-delimited-timeseries-showcase

  • Changes how paths are stored under "nav.dhcp" in the Carbon/Graphite database:

    • previously, paths followed this pattern:
      nav.dhcp.4.pools.<server-name>.<pool-name>.<first-ip>.<last-ip>.{assigned,total,declined}
      
    • now, paths follow this pattern:
      nav.dhcp.4.<server-name>.{range,pool,subnet}.{special_groups,custom_groups}.<group-name>.<first-ip>.<last-ip>.{assigned,total,declined}
      
      Basically, we
      • Rename <pool-name> to <group-name> in all pieces of documentation.
      • Differentiate between a range, a pool, and a subnet, because different DHCP servers have different semantics and data-coarseness. A range is the set of all IPs between a first-ip and a last-ip (inclusive), whereas a pool is an arbitrary set of IPs greater than or equal to a first-ip and less than or equal to a last-ip. Further, all IPs in a range and a pool are available for lease, whereas this is not necessarily the case for a subnet.
      • Add an additional segment {special_groups,custom_groups} to differentiate between group-names that NAV must create and manage itself and group-names that NAV is able to infer from some piece of externally sourced information such as DHCP server config files.
  • Expects [server_<server-name>] sections in dhcpstats.conf instead of [endpoint_<server-name>] for consistency

    • The new code has backwards-compability with the old naming-scheme
  • Changes the name of the user_context_poolname_key option in dhcpstats.conf to user_context_groupname_key.

    • The new code does not have backwards-compability with the old naming-scheme here because there is already breaking changes to Graphite paths. Also, NAV works fine when this option is absent.

Contributor Checklist

Every pull request should have this checklist filled out, no matter how small it is.
More information about contributing to NAV can be found in the
Hacker's guide to NAV.

  • Added a changelog fragment for towncrier
  • Added/amended tests for new/changed code
  • Added/changed documentation See independent PR
  • Linted/formatted the code with ruff, easiest by using pre-commit
  • Wrote the commit message so that the first line continues the sentence "If applied, this commit will ...", starts with a capital letter, does not end with punctuation and is 50 characters or less long. See https://cbea.ms/git-commit/
  • Based this pull request on the correct upstream branch: For a patch/bugfix affecting the latest stable version, it should be based on that version's branch (<major>.<minor>.x). For a new feature or other additions, it should be based on master.
  • If applicable: Created new issues if this PR does not fix the issue completely/there is further work to be done TODO: add a new issue for using Rickshaw graphs instead of PNG images in graph navlets
  • If it's not obvious from a linked issue, describe how to interact with NAV in order for a reviewer to observe the effects of this change first-hand (commands, URLs, UI interactions) See screencast below
  • If this results in changes in the UI: Added screenshots of the before and after
  • If this adds a new Python source code file: Added the boilerplate header to that file

Testing It For Yourself

You can run this script to fill the prefixes 192.0.2.0/24 and 198.51.100.0/24 with DHCP stats (you might need to install socat and gawk, and you might need to change 127.0.0.1 to the address of your Carbon/Graphite database):

#!/bin/sh

cat <<EOF |
kea-trd staff 192.0.2.1 192.0.2.16 16
kea-trd staff 192.0.2.17 192.0.2.64 48
kea-trd staff 192.0.2.65 192.0.2.126 64
kea-trd guest 192.0.2.200 192.0.2.255 56
kea-trd guest 198.51.100.1 198.51.100.255 255
kea-osl staff 198.51.100.1 198.51.100.255 255
EOF

gawk '
BEGIN {
  NOW = systime()
  INTERVAL = 300
  DAYS = 30
}

{
  a = rand()
  b = rand()
  c = rand()
  _ = 1 / (a + b + c)
  a = a * _
  b = b * _
  c = c * _
  d = rand()*3.14*2
  e = 20 + rand()*8
  f = rand()

  server_name = $1
  group_name = $2
  first_ip = $3; gsub(/\./, "_", first_ip)
  last_ip = $4; gsub(/\./, "_", last_ip)
  total_ips = $5
  noise = rand()
  for (offset = 0; offset < 3600*24*DAYS; offset += INTERVAL) {
    sine = (sin((offset*3.14*2)/(3600*e) + d)+1)/2
    noise = noise*f + rand()*(1-f)
    assigned_ips = int(sine*total_ips*a+total_ips*b+noise*total_ips*c)
    printf("nav.dhcp.4.%s.range.custom_groups.%s.%s.%s.assigned %d %d\n", server_name, group_name, first_ip, last_ip, assigned_ips, NOW - offset)
    printf("nav.dhcp.4.%s.range.custom_groups.%s.%s.%s.total %d %d\n", server_name, group_name, first_ip, last_ip, total_ips, NOW - offset)
  }
}' |

socat - tcp:127.0.0.1:2003

Here's a screencast:

screencast.mp4

At 2:00 in the video, notice that the graph labelled DHCP ranges in 'guest' on server 'kea-trd' has stats from ranges in both 192.0.2.0/24 and 198.51.100.0/24 despite vlan 20 only containing 192.0.2.0/24. This is because whenever a group (here: 'guest' on server 'kea-trd') has at least one range/pool/subnet overlapping with the VLAN/Prefix, the whole group will be displayed on that VLAN/Prefix's page.

This way, we can pass wildcard strings such as "*" and "{foo,bar}" to graphite
metric template functions and be assured that they won't be escaped to "_" and
"_foo_bar_".
Before, we only needed to translate from runtime data to Graphite paths, so we
only needed the metric_path_for_dhcp() function. But now we'll need to start
translating from Graphite paths back to runtime data because the Graphite paths
contains the <first_ip> and <last_ip> segments used to decide which
VLANs/Prefixes a stat belongs to. Further, the Graphite paths also contains the
<server_name>, <allocation_type>, <ip_version>, <group_name>, and
<group_name_source> segments used to decide which graph on that specific
VLAN/Prefix page a stat should be displayed in.

Thus this commit adds a new class, DhcpPath, which basically has the purpose of
containing runtime path data and translating between path data sourced from
external sources (DHCP server APIs), runtime data, and Graphite paths, like so:

  external path data ---> runtime path data <---> Graphite path
     (DHCP server)           (DhcpPath)            (Graphite)
More specifically, use the class to do the following translations:

  external path data ---> runtime path data ---> Graphite path
     (Kea API)               (DhcpPath)           (Graphite)

Thus delegating validity checks and transformations.

Also, this commit renames some variables and comments to stay consistent
with the variables and comments used in DhcpPath.
Because we didn't previously test for the case when a Kea DHCP configuration
file *does* contains a pool with a user-context object but that *does not*
contain the "group" key wanted by the kea_dhcp.py client.
@jorund1 jorund1 force-pushed the dhcpstats-frontend branch from 3439a9d to 90682bc Compare October 28, 2025 17:30
@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 84.89796% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.04%. Comparing base (34c2f3b) to head (800fb70).
⚠️ Report is 56 commits behind head on master.

Files with missing lines Patch % Lines
python/nav/dhcpstats/common.py 83.07% 22 Missing ⚠️
python/nav/bin/dhcpstats.py 41.66% 7 Missing ⚠️
python/nav/metrics/graphs.py 75.00% 4 Missing ⚠️
python/nav/dhcpstats/kea_dhcp.py 93.93% 2 Missing ⚠️
python/nav/web/info/prefix/views.py 90.00% 1 Missing ⚠️
python/nav/web/info/vlan/views.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3633      +/-   ##
==========================================
+ Coverage   62.67%   63.04%   +0.36%     
==========================================
  Files         611      612       +1     
  Lines       45102    45341     +239     
  Branches       43       43              
==========================================
+ Hits        28268    28583     +315     
+ Misses      16824    16748      -76     
  Partials       10       10              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

This is part of simplifying the terminology used in the up-and-coming
documentation and configuration files for dhcpstats and its frontend;
terminology in the implementation and terminology used in the (user) interface
termonology should diverge as little as possible.

The documentation has more or less used the terms 'endpoint' and '(DHCP) server'
interchangeably. We fix this so that only the term '(DHCP) server' is used.
Where the user-facing interfaces are affected, backwards compability is added to
avoid the most dramatical errors.
We really never need get-config-hash to work; it's just used to spare one
round-trip. Catching dhcpstats.errors.CommunicationError means we now should be
gracefully handling any API errors and any HTTP errors; a superset of what we
did earlier.
This type represents a graphite metric; a type which is useful
in other modules than just dhcpstats/kea_dhcp.py.
This fix allows HTTPS schemes with one or more uppercased letters to also be
detected.
This is mostly so that we're able to support HTTP Basic Authentication in the
kea_dhcp.py client where the password is the empty string.
These can be used instead of string interpolations sprinkled among other
business code, which quickly becomes unwieldy when you want a little bit more
advanced Graphite renders.
The function fetch_graph_urls_for_prefixes takes a set of prefixes
(e.g. obtained from a models.manage.Prefix or from a models.manage.Vlan) and
returns one url per DHCP graph from Graphite related to one or more of the
prefixes. Each url returns a JSON with graph data.
..a VLAN/Prefix page has DHCP graphs displayed if NAV has collected
some recent enough DHCP stats for IP addresses that intersect that
VLAN/Prefix.
@jorund1 jorund1 force-pushed the dhcpstats-frontend branch from 90682bc to db85e5a Compare October 28, 2025 18:05
@jorund1 jorund1 force-pushed the dhcpstats-frontend branch from db85e5a to 50cb13a Compare October 28, 2025 18:29
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 5, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Comment on lines +147 to +148
first_ip = IPy.IP(IPy.IP(first_ip)[0])
last_ip = IPy.IP(IPy.IP(last_ip)[-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be unnecessary. Inputting prefixes into the first_ip and last_ip arguments should be considered an error, rather than trying to do some weird guesstimation and allow it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is in the blocking category (these two lines just look weird)

Comment on lines +188 to +195
for prefix in prefixes:
if (
self.first_ip in prefix
or self.last_ip in prefix
or self.first_ip < prefix < self.last_ip
):
return True
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could potentially use any() here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is non-blocking, just a suggestion

Copy link
Member

@lunkwill42 lunkwill42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the code seems nicely structured. After a long F2F session, I would say my most important suggestion is to split this into two PRs: One for the rewrite of the backend functionality, and one for the purely front-end related additions.

Lastly, I understand there is also a documentation PR coming, which is nice.

@lunkwill42
Copy link
Member

Overall the code seems nicely structured. After a long F2F session, I would say my most important suggestion is to split this into two PRs: One for the rewrite of the backend functionality, and one for the purely front-end related additions.

Lastly, I understand there is also a documentation PR coming, which is nice.

Also a reminder to properly enable the pre-commit hooks in each clone you have: Linting failures abound here...

@lunkwill42
Copy link
Member

Re: Tip about splitting into multiple PRs: I cannot recall whether you have permission to push branches on the upstream repo, but if you do, it makes it easier to build PRs that depend on each other. The first PR would have master as its base. The second PR would have the first PR's branch as its base. This makes the diffs more manageable for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add functionality to NAV to graph DHCP lease stats on each VLAN page in NAV

3 participants